Data workflows in RStudio Connect using pins

@javierluraschi / @rstudio

09/10/2019

Today

mlflow

MLflow docs require manually downloading win-quality.csv.

r2d3

The r2d2 package docs require downloading flare.csv.

readr

The readr package docs require downloading challenge.csv.

Reproducible?

Workarounds

Download File

Avoid Redownloading?

Support URL’s in all packages?

# A tibble: 252 x 2
   id                                           value
   <chr>                                        <dbl>
 4 flare.analytics.cluster.AgglomerativeCluster  3938
 5 flare.analytics.cluster.CommunityStructure    3812
 6 flare.analytics.cluster.HierarchicalCluster   6714

Enough?

  • Add to .gitignore?
  • Share across projects?
  • Detect upstream changes?
  • Update or share datasets?

Pins

Cache

With pins we can easily cache resource,

"/Users/javierluraschi/Library/Caches/pins/local/flare/flare.csv"

But wait, there is more!

Intro

Functionality

You can use the pins package to:

  • Pin remote resources locally to work offline and cache results with ease, pin() stores resources in boards which you can then retrieve with pin_get().
  • Discover new resources across different boards using pin_find().
  • Share resources on GitHub, Kaggle or RStudio Connect by registering new boards with board_register().
  • Resources can be anything from CSV, JSON, or image files to arbitrary R objects.

What can I pin?

Anything!

Where can I store pins?

Anywhere! – That implements the ‘board’ interface.

What is a board?

A storage location, like your local file systems, GitHub, Kaggle or RStudio Connect.

RStudio Connect

Pin

Discover

Share

Resources

Automation